R provides several classes for representing time series objects for a
variety of applications. Among those classes, ts is one of
the main formats for time series data in R, mainly due to its simplicity
and the wide adoption of this class by the main packages in R for time
series analysis, for example, the forecast and
stats packages.
The attributes of the ts class
A regular time series is defined as an ordered sequence of
observations over time, which is captured at equally spaced time
intervals. Whenever this condition ceases to exist, the series becomes
an irregular time series. The main characteristics of regular time
series data is as follows:
Cycle/period: a regular unit of time that split the series into
consecutive and equally long subsets
frequency: defines the length or the number of units of the
cycle
timestamp: provides the time each observation in the series was
captured, and can be used as the series index.
A ts object is composed of two elements - the series
values and its corresponding timestamp.
# number of observaitions
length(ngc)
[1] 76
We can look at the structure of a ts dataset with the
head() function:
ngc
Qtr1 Qtr2 Qtr3 Qtr4
2000 2050.6 1513.1 1475.0 2587.5
2001 2246.6 1444.4 1494.1 2120.2
2002 2258.4 1591.4 1542.2 2378.9
2003 2197.9 1368.4 1428.6 2263.7
2004 2100.9 1483.7 1482.2 2327.7
2005 2205.8 1534.1 1422.5 2326.4
2006 2126.4 1550.9 1462.1 2122.8
2007 2128.9 1555.2 1590.5 2399.2
2008 2278.2 1604.3 1460.9 2399.7
2009 2170.7 1527.8 1575.0 2491.9
2010 2142.9 1649.5 1637.5 2714.1
2011 2230.5 1657.3 1655.6 2541.9
2012 2127.8 1868.4 1807.2 2503.9
2013 2521.1 1742.9 1767.0 2920.8
2014 2557.9 1745.4 1809.3 2679.2
2015 2591.3 1899.9 1901.3 2588.2
2016 2356.3 2000.7 1947.8 2866.3
2017 2523.3 1910.4 1920.5 3086.0
2018 2796.7 2063.1 2156.1 2999.5
Here the rows represent the number of the cycle and the columns
represent the cycle units. For the ngc data, each calendar
year is a full cycle and the quarters are the cycle units.
The cycle() and the time() functions from
the stats package provide the cycle units and the
timestamp of each observation in the series:
cycle(ngc)
Qtr1 Qtr2 Qtr3 Qtr4
2000 1 2 3 4
2001 1 2 3 4
2002 1 2 3 4
2003 1 2 3 4
2004 1 2 3 4
2005 1 2 3 4
2006 1 2 3 4
2007 1 2 3 4
2008 1 2 3 4
2009 1 2 3 4
2010 1 2 3 4
2011 1 2 3 4
2012 1 2 3 4
2013 1 2 3 4
2014 1 2 3 4
2015 1 2 3 4
2016 1 2 3 4
2017 1 2 3 4
2018 1 2 3 4
time(ngc)
Qtr1 Qtr2 Qtr3 Qtr4
2000 2000.00 2000.25 2000.50 2000.75
2001 2001.00 2001.25 2001.50 2001.75
2002 2002.00 2002.25 2002.50 2002.75
2003 2003.00 2003.25 2003.50 2003.75
2004 2004.00 2004.25 2004.50 2004.75
2005 2005.00 2005.25 2005.50 2005.75
2006 2006.00 2006.25 2006.50 2006.75
2007 2007.00 2007.25 2007.50 2007.75
2008 2008.00 2008.25 2008.50 2008.75
2009 2009.00 2009.25 2009.50 2009.75
2010 2010.00 2010.25 2010.50 2010.75
2011 2011.00 2011.25 2011.50 2011.75
2012 2012.00 2012.25 2012.50 2012.75
2013 2013.00 2013.25 2013.50 2013.75
2014 2014.00 2014.25 2014.50 2014.75
2015 2015.00 2015.25 2015.50 2015.75
2016 2016.00 2016.25 2016.50 2016.75
2017 2017.00 2017.25 2017.50 2017.75
2018 2018.00 2018.25 2018.50 2018.75
A more concise way to get this information is with the
frequency() and deltat() functions:
deltat(ngc)
[1] 0.25
Other useful functions are start() and
end():
start(ngc)
[1] 2000 1
end(ngc)
[1] 2018 4
The ts_info() function from the TStudio
package provides a concise summary of most of the functions above.
ts_info(ngc)
The ngc series is a ts object with 1 variable and 76 observations
Frequency: 4
Start time: 2000 1
End time: 2018 4
Multivariate time series objects
When you have multivariate time series data, you need to use the
mts (multiple time series) class. This combines the
functionality of the ts and matrix
classes.
ts_info(Coffee_Prices)
The Coffee_Prices series is a mts object with 2 variables and 701 observations
Frequency: 12
Start time: 1960 1
End time: 2018 5
Creating a ts object
my_ts1 <- ts(data = 1:60,
start = c(2010, 1),
end = c(2014, 12),
frequency = 12)
ts_info(my_ts1)
The my_ts1 series is a ts object with 1 variable and 60 observations
Frequency: 12
Start time: 2010 1
End time: 2014 12
my_ts1
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2010 1 2 3 4 5 6 7 8 9 10 11 12
2011 13 14 15 16 17 18 19 20 21 22 23 24
2012 25 26 27 28 29 30 31 32 33 34 35 36
2013 37 38 39 40 41 42 43 44 45 46 47 48
2014 49 50 51 52 53 54 55 56 57 58 59 60
Now we will work through the typical process of converting data from
a data.frame to a ts object.
str(US_indicators)
'data.frame': 528 obs. of 3 variables:
$ Date : Date, format: "1976-01-31" "1976-02-29" "1976-03-31" "1976-04-30" ...
$ Vehicle Sales : num 885 995 1244 1191 1203 ...
$ Unemployment Rate: num 8.8 8.7 8.1 7.4 6.8 8 7.8 7.6 7.4 7.2 ...
For now, we will only convert the vehicle sales into a
ts object.
Next, we need to define the start or end of the series. In this case,
the series started in January 1976 so we can define it as
start = c(1976, 1). Or we can write code to capture the
starting point.
start_point
[1] 1976 1
Now we build the series:
One of the main limitations of the ts class is that it
can only support two input elements for the timestamp. For example, when
we converted tvs into a ts object, we lost the
day component because ts could only store the month and
year.
Creating an mts object
ts_info(US_indicators_ts)
The US_indicators_ts series is a mts object with 2 variables and 528 observations
Frequency: 12
Start time: 1976 1
End time: 2019 12
Setting the series frequency
Setting the frequency of a series sets the length of a cycle.
\[
\text{Frequency} = \frac{\text{cycle length}}{\text{time interval
between observation}}
\]
In this example we will see how setting the frequency impacts the
structure of the ts object output. First, we simulate close
to ten years of daily data.
str(daily_df)
'data.frame': 3650 obs. of 2 variables:
$ date: Date, format: "2010-01-01" "2010-01-02" "2010-01-03" "2010-01-04" ...
$ y : num 14 11.9 16.4 12.8 13.8 ...
Create ts object:
ts_info(days_week_ts)
The days_week_ts series is a mts object with 2 variables and 3650 observations
Frequency: 7
Start time: 1 6
End time: 523 1
Data manipulation of ts objects
The window function
The main purpose of a window function is to subset a ts
object based on a time range. The main argument of the
window() function are the start and
end arguments. Let’s use the window() function
to extract all the observations of the year 2005 from the NGC
series:
window(ngc, start = c(2005, 1), end = c(2005, 4))
Qtr1 Qtr2 Qtr3 Qtr4
2005 2205.8 1534.1 1422.5 2326.4
We can also extract a specific frequency unit from the series. Say
we’re interested in extracting all the observations of the series that
occurred in the third quarter of the year. This can be done by setting
the starting point at the third quarter of the first year and the
frequency to 1.
window(ngc, start = c(2000, 3), frequency = 1)
Time Series:
Start = 2000.5
End = 2018.5
Frequency = 1
[1] 1475.0 1494.1 1542.2 1428.6 1482.2 1422.5 1462.1 1590.5 1460.9 1575.0 1637.5 1655.6 1807.2 1767.0 1809.3 1901.3 1947.8 1920.5 2156.1
Aggregating ts objects
The aggregate() function splits the data into subsets,
computes specific summary statistics, and then aggregates the results to
a ts or data.frame object. Let’s use
aggregate() to transform the NGC series from a quarterly
frequency to yearly:
1+1
[1] 2
Creating lags and leads for ts objects
The lag() function from the stats
package (this should not be confused with the lag()
function from the dplyr package) can be used to create
lags or leads for ts objects.
ts_info(ngc_lag4)
The ngc_lag4 series is a ts object with 1 variable and 76 observations
Frequency: 4
Start time: 2001 1
End time: 2019 4
---
title: "Chapter 2: The Time Series Object"
output: html_notebook
---

R provides several classes for representing time series objects for a variety of applications. Among those classes, `ts` is one of the main formats for time series data in R, mainly due to its simplicity and the wide adoption of this class by the main packages in R for time series analysis, for example, the `forecast` and `stats` packages.

# The Natural Gas Consumption dataset

```{r}
library(pacman)
p_load(Quandl)

ngc <- Quandl(code = "FRED/NATURALGAS",
              collapse = "quarterly",
              type = "ts",
              end_date = "2018-12-31")

class(ngc)
```

The simplest method to plot a `ts` object is with the `plot` function:

```{r}
plot.ts(ngc,
        main = "US Quarterly Natural Gas Consumption",
        ylab = "Billion of Cubic Feet")
```

# The attributes of the `ts` class

A regular time series is defined as an ordered sequence of observations over time, which is captured at equally spaced time intervals. Whenever this condition ceases to exist, the series becomes an irregular time series. The main characteristics of regular time series data is as follows:

-   Cycle/period: a regular unit of time that split the series into consecutive and equally long subsets

-   frequency: defines the length or the number of units of the cycle

-   timestamp: provides the time each observation in the series was captured, and can be used as the series index.

A `ts` object is composed of two elements - the series values and its corresponding timestamp.

```{r}
# number of observaitions
length(ngc)
```

We can look at the structure of a `ts` dataset with the `head()` function:

```{r}
ngc
```

Here the rows represent the number of the cycle and the columns represent the cycle units. For the `ngc` data, each calendar year is a full cycle and the quarters are the cycle units.

The `cycle()` and the `time()` functions from the **stats** package provide the cycle units and the timestamp of each observation in the series:

```{r}
cycle(ngc)

time(ngc)
```

A more concise way to get this information is with the `frequency()` and `deltat()` functions:

```{r}
frequency(ngc)

deltat(ngc)
```

Other useful functions are `start()` and `end()`:

```{r}
start(ngc)

end(ngc)
```

The `ts_info()` function from the **TStudio** package provides a concise summary of most of the functions above.

```{r}
p_load(TSstudio)

ts_info(ngc)
```

## Multivariate time series objects

When you have multivariate time series data, you need to use the `mts` (multiple time series) class. This combines the functionality of the `ts` and `matrix` classes.

```{r}
data("Coffee_Prices")
head(Coffee_Prices)

ts_info(Coffee_Prices)
```

## Creating a `ts` object

```{r}
my_ts1 <- ts(data = 1:60,
             start = c(2010, 1),
             end = c(2014, 12),
             frequency = 12)

ts_info(my_ts1)

my_ts1
```

Now we will work through the typical process of converting data from a `data.frame` to a `ts` object.

```{r}
library(tidyverse)

# load the data
data("US_indicators")
str(US_indicators)
```

For now, we will only convert the vehicle sales into a `ts` object.

```{r}
tvs <- 
  US_indicators %>% 
  select(Date, `Vehicle Sales`) %>% 
  arrange(Date)

head(tvs)
```

Next, we need to define the start or end of the series. In this case, the series started in January 1976 so we can define it as `start = c(1976, 1)`. Or we can write code to capture the starting point.

```{r}
library(lubridate)

start_point <- c(year(min(tvs$Date)), month(min(tvs$Date)))
start_point
```

Now we build the series:

```{r}
tvs_ts <- ts(data = tvs$`Vehicle Sales`,
             start = start_point,
             frequency = 12)
```

One of the main limitations of the `ts` class is that it can only support two input elements for the timestamp. For example, when we converted `tvs` into a `ts` object, we lost the day component because `ts` could only store the month and year.

## Creating an `mts` object

```{r}
US_indicators <- arrange(US_indicators, Date)

US_indicators_ts <- ts(data = select(US_indicators, `Vehicle Sales`, 
                                     `Unemployment Rate`),
                       start = c(year(min(tvs$Date)), month(min(tvs$Date))),
                       frequency = 12)

ts_info(US_indicators_ts)
```

## Setting the series frequency

Setting the frequency of a series sets the length of a cycle.

$$
\text{Frequency} = \frac{\text{cycle length}}{\text{time interval between observation}}
$$

In this example we will see how setting the frequency impacts the structure of the `ts` object output. First, we simulate close to ten years of daily data.

```{r}
daily_df <- data.frame(date = seq.Date(from = as.Date("2010-01-01"),
                                       length.out = 365 * 10, by = "day"),
                       y = rnorm(365 * 10, mean = 15, sd = 2))

str(daily_df)
```

Create `ts` object:

```{r}
days_week_ts <- ts(daily_df,
                   start = c(1, wday(min(daily_df$date))),
                   frequency = 7)

ts_info(days_week_ts)
```

# Data manipulation of `ts` objects

## The window function

The main purpose of a window function is to subset a `ts` object based on a time range. The main argument of the `window()` function are the `start` and `end` arguments. Let's use the `window()` function to extract all the observations of the year 2005 from the NGC series:

```{r}
window(ngc, start = c(2005, 1), end = c(2005, 4))
```

We can also extract a specific frequency unit from the series. Say we're interested in extracting all the observations of the series that occurred in the third quarter of the year. This can be done by setting the starting point at the third quarter of the first year and the `frequency` to 1.

```{r}
window(ngc, start = c(2000, 3), frequency = 1)
```

## Aggregating `ts` objects

The `aggregate()` function splits the data into subsets, computes specific summary statistics, and then aggregates the results to a `ts` or `data.frame` object. Let's use `aggregate()` to transform the NGC series from a quarterly frequency to yearly:

```{r}
ngc_yearly <- aggregate(ngc, nfrequency = 1, FUN = "sum")
ngc_yearly
```

## Creating lags and leads for `ts` objects

The `lag()` function from the **stats** package (this should not be confused with the `lag()` function from the **dplyr** package) can be used to create lags or leads for `ts` objects.

```{r}
ngc_lag4 <- stats::lag(ngc, k = -4)

ts_info(ngc_lag4)
```

# Visualizing `ts` and `mts` objects

## The `plot.ts()` function

Plotting a `ts` object:

```{r}
plot.ts(tvs_ts,
        main = "US Monthly Total Vehicle Sales",
        ylab = "Thousands of Vehicle",
        xlab = "Time")
```

Plotting an `mts` object:

```{r}
plot.ts(US_indicators_ts,
        plot.type = "multiple",
        main = "US Monthly Vehicle Sales vs. Unemployment Rate",
        xlab = "Time")
```

## The **dygraphs** package

The **dygraphs** package is an R interface to the `dygraphs` JavaScript charting library.

```{r}
p_load(dygraphs)

dygraph(tvs_ts,
        main = "US Monthly Total Vehicle Sales",
        ylab = "Thousands of Vehicle") %>% 
  dyRangeSelector()
```

For the `US_indicators_ts` series, we will add a second *y*-axis, which allows us to plot and compare the two series that are not on the same scale:

```{r}
dygraph(US_indicators_ts,
        main = "US Monthly Vehicle Sales vs. Unemployment Rate") %>% 
  dyAxis("y", label = "Vehicle Sales") %>% 
  dyAxis("y2", label = "Unemployment Rate") %>% 
  dySeries("Vehicle Sales", axis = "y", color = "green") %>% 
  dySeries("Unemployment Rate", axis = "y2", color = "red") %>% 
  dyLegend(width = 400)
```

## The TSstudio package

```{r}
p_load(TSstudio)

ts_plot(tvs_ts,
        title = "US Monthly Total Vehicle Sales",
        Ytitle = "Thousands of Vehicle",
        slider = TRUE)
```

We can add an interactive slider for the *x*-axis.

```{r}
ts_plot(US_indicators_ts,
        title = "US Monthly Vehicle Sales vs. Unemployment Rate",
        type = "multiple")
```
